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ABSTRACT 


My mother has been a diabetic for the last 15 years of her life. I have known 
the difference between the Fasting and BP sugar levels for a long time. 
Therefore, when I found a public data set of the diabetes levels against age, 
blood pressure, and BMI, it got me thinking if I could map the relationship 
between the multiple factors and figure out how these factors can have an 
impact on the blood sugar levels. 

In this paper, I attempt to map the relation between the multiple factors, 
affecting the blood sugar levels. I will be using the R Studio, using the R 
Programming language to implement the project. 
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Goals: 

> Plotting the Glucose levels with the Age, BP, Insulin levels. 

> Removing the outliers from the data set. 

> Creating a model of the Glucose levels against the Age, BP, Insulin levels. 

> Plotting the predicted values of the Glucose levels. 
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Flow of the Project 



> Start: Installation of the packages, and loading of the data. 

> Input: Data in the form of the csv file will be loaded and data points will be isolated. 

> Processing the data: Isolate the data and create the data frame of Diabetes. 

> Plot the data points: Plot the data points, specifically the glucose levels against the BP, Age and BMI. 

> Removal of Outliers: Identify the outliers in the data set and remove them from the data set. 

> Create the model of the data: Create multiple models of the data set. 

> Compare the models: Compare the different models created, find the AIC values and use the one, which has the least 
value. 

> Predict the values from the model: Use the 'Predict' function to predict the values of the selected model. 

> Plot the forecast values: Plot the 'Predicted' values. 

Packages Used 

> ggplot2: To plot the model. 

> Forecast: To forecast the time series of the data set. 

> Scales: Used to dress the data set. 

Flow with Plots 

#Loading the Input File: 

DiabetesRawData<-read.csv(file.choose()) 

Load the raw data from the diabetes data set. 

The data set used in this project has been attached. 

#P\ot of the Raw Data: 

#Plot of the Glucose Levels 

qplot(Diabetes$Glucose,main ='Plot of the Glucose Levels of the Raw Data',xlab = 'Glucose Levels', colour=I("red")) 


Plot of the Glucose Levels of the Raw Data 
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#Plotting the Glucose Levels 
Diabetesc-na.omit(DiabetesRawData) 

ggplot(Diabetes,aes(x=Age,y=Glucose))+geom_pointO 

ggplot(Diabetes ; aes(x=Age,y=Glucose))+geom_pointO-i-labs(x='Age',y='Glucose Level',title='Plot of the Age against Glucose 
Levels') 
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#We have to remove the outliers from the Graph; we will eliminate the Records having Glucose levels less than 50. 
Diabetes<-Diabetes[Diabetes$Glucose>50,] 

ggplot(Diabetes,aes(x=Age,y=Glucose))+geom_pointO+labs(x='Age\y='Glucose Level', title='Plot of the Age against Glucose 
Levels') 

Plot of the Aqe against Glucose Levels 
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#Plot the Age against the BP, to isolate and eliminate the outlier values of BP 

ggplot(Diabetes,aes(x=Age,y=BloodPressure))+geom_pointO + labs(x='Age',y='GlucoseLever,title='PlotoftheAge againstBlood 
Pressure Levels') 
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Plot of the Age against Glucose Levels 
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Plot of the Age against Blood Pressure Levels 
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Diabetes<-Diabetes[Diabetes$BloodPressure>25,] 

ggplot(Diabetes,aes(x=Age,y=BloodPressure))+geom_point()+labs(x='Age',y='GlucoseLever,title='PlotoftheAge against Blood 
Pressure Levels') 


Plot of the Age against Blood Pressure Levels 
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#Outliers have been removed for the BP 
^Checking for Insulin Levels 

ggplot(Diabetes,aes(x=Age,y=Insulin))+geom_pointO + labs(x='Age',y='Insulin Level',title='Plot of the Age against Insulin 
Levels') 
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Plot of the Age against Insulin Levels 
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#Clearing the Outliers from the data set 

Diabetes<-Diabetes[Diabetes$Insulin<400,] 

Diabetes<-Diabetes[Diabetes$Insulin>5,] 

ggplotfDiabetes^esfx^Age^^InsulinJJ+geom.pointO+labsfx^'Age'^y^'Insulin Level',title='Plot of the Age against Insulin 
Levels') 


Plot of the Age against Insulin Levels 
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#Checking the health ofBMI data 

ggplot(DiabetesRawData,aes(x=Age,y=BMI))+geom_point()+labs(x='Age',y='BMr,title='Plot of the Age against BMI') 
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Plot of the Age against BMI 
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Diabetes<-Diabetes[Diabetes$BMI<60,] 

Diabetes<-Diabetes[Diabetes$BMI>10,] 

ggplot(Diabetes,aes(x=Age,y=Insulin))+geom_pointO+labs(x='Age',y='BMIVtitle='Plot of the Age against BMI') 

Plot of the Age against BMI 
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#Checking the health of the Diabetes Pedigree Function. 

ggplot(Diabetes,aes(x=Age,y=DiabetesPedigreeFunction))+geom_point()+labs(x='Age',y='Diabetes Pedigree Function 
',title='Plot of the Age against Diabetes Pedigree Function') 
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Plot of the Age against Diabetes Pedigree Function 
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Diabetes<-Diabetes[Diabetes$DiabetesPedigreeFunction<2.0,] 

ggplot(Diabetes,aes(x=Age,y=DiabetesPedigreeFunction))+geom_pointO+labs(x='Age',y='Diabetes Pedigree Function 
',title='Plot of the Age against Diabetes Pedigree Function') 

Plot of the Age against Diabetes Pedigree Function 


80 


#Now that we have a cleaned up data set , we can now start working on creating models from this data set,we will be training the 
model on the 'Diabetes' data set, but we will need a new data set to test it on. So we create a new data set which can be tested upon. 

DiabetesTest<-sapply(Diabetes,rep.int, times=3) 

DiabetesT est<-dat.frame (DiabetesT est) 

#This creates a DiabetesTest data frame, which is repeated 3 times. This data set can be used to test the Model. 

DiabetesModell<-glm(Glucose~BMI*Insulin*Age*BloodPressure,data = Diabetes) 
DiabetesModel2<-glm(Glucose~BMI+Insulin+Age+BloodPressure,data = Diabetes) 
DiabetesModel3<-glm(Glucose~BMI:Insulin:Age:BloodPressure,data = Diabetes) 
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DiabetesModel4<-glm(Glucose~BMI*Insulin*Age:BloodPressure,data = Diabetes) 
DiabetesModel5<-glm(Glucose~BMI*Insulin*Age+BloodPressure,data = Diabetes) 
DiabetesModel6<-glm(Glucose~BMI*Insulin+Age+BloodPressure,data = Diabetes) 

DiabetesModel7<-glm(Glucose~BMI*Insulin:Age+BloodPressure,data = Diabetes) 
DiabetesModel8<-glm(Glucose~BMI+Insulin*Age*BloodPressure,data = Diabetes) 
DiabetesModel9<-glm(Glucose~BMI:Insulin*Age*BloodPressure,data = Diabetes) 

#We will do a multiplot of the models , and then compare them by using theAIC. 

multiplot(DiabetesModell,DiabetesModel2,DiabetesModel3) 

AIC(DiabetesModell,DiabetesModel2,DiabetesModel3,DiabetesModel4,DiabetesModel5,DiabetesModel6,DiabetesModel7,Diabe 

tesModel8,DiabetesModel9) 

dfAIC 

DiabetesModell 17 3374.796 
DiabetesModel2 6 3365.353 
DiabetesModel3 3 3423.790 
DiabetesModel4 9 3363.508 
DiabetesModel5 10 3365.492 
DiabetesModel6 7 3367.345 
DiabetesModel7 6 3399.525 
DiabetesModel8 10 3368.397 
DiabetesModel9 9 3380.641 

#TheDiabetesModel4 has the least value ofAIC of the corresponding models. Hence this shall be considered to predict the values. 
ggplot(DiabetesModel4, aes(x=Age,y=Glucose))+geom_pointO+theme_economistO 
#Predict the values of the Model, based on a test data set 
DiabetesPrediction<-predict(DiabetesModel4,DiabetesTest) 

DiabetesPrediction<-predict(DiabetesModel4,DiabetesTest,se.fit = TRUE) #This creates the fit plot of the Prediction. 

#Plot of the Predicted Glucose Levels 

qplot(Diabetes$Glucose,main ='Plot of the Predicted Values of Blood Glucose Levels against the number of readings',xlab = 
'Predicted Glucose Levels', colour=I("red")) 
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‘dieted Values of Blood Glucose Levels against the ni 
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Conclusion 

> Loaded the data. 

> Removed the outliers. 

> Plotted the data points. 

> We found the relation between the factors influencing 
the blood glucose levels. 


> Create the models. 

> Compare the multiple models. 

> Predict the Glucose levels based on the models. 

> Plot the range of predicted values. 
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